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1 Introduction 

The asymptotic behaviour of empirical processes has been studied for more than 60 years. 
The first rigorous result was the empirical process central limit theorem for i.i.d. data, 
established by Donsker (1952). This theorem, conjectured by Doob (1949), made it possible 
to derive the asymptotic distribution of a large number of test statistics and estimators 
that can be represented as functionals of the empirical process, by an application of the 
continuous mapping theorem. Among the examples are the Kolmogorov-Smirnov goodness 
of fit test, the Cramer- Von Mises to 2 criterion, and more generally von Mises Statistics. 
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Ciesielski and Kesten (1962) were among the first to extend Donsker's empirical process 
CLT to weakly dependent data, studying the empirical distribution of remainders in the 
dyadic expansion of a random number u G [0, 1]. Billingsley (1968) proved the first general 
result for dependent data, namely an empirical process CLT for data that can be represented 
as functionals of a mixing process. For an overview of the literature on empirical processes 
of dependent data, see Dehling and Philipp (2002). 

Miiller (1970), and independently Kiefer (1972), initiated the study of the sequential 
empirical process, defined as 



where F{x) = P{X\ < x). The process U n (x,t) is also known as the two-parameter empir- 
ical process. Kiefer and Miiller showed that for i.i.d. data, the sequential empirical process 
converges in distribution to a mean zero Gaussian process K(x,t) with covariance structure 



The limit process K(t, x) is called Kiefer process, or Kiefer-Miiller process. 

Komlos, Major, and Tusnady (1975), using a technique due to Csorgo and Revesz (1975), 
established the almost sharpest possible bounds for the error in the approximation of 
the sequential empirical process by the Kiefer process in the i.i.d. case so far. For an 
overview of this topic, see the book by Csorgo and Revesz (1981) or the survey article by 
Ganssler and Stute (1979). 

Many authors have studied extensions of the sequential empirical process CLT to depen- 
dent data, e.g. Berkes and Philipp (1977) and Philipp and Pinzur (1980) for strongly mixing 
processes and Berkes, Hormann, and Schauer (2009) for S-mixing processes. Dehling and 
Taqqu (1989) determined the asymptotic distribution of the sequential empirical process in 
the case of long-range dependent data. 

Recently, Dehling, Durieu, and Volny (2009) have developed a technique to prove empir- 
ical process CLTs for Markov chains and dynamical systems that do not necessarily satisfy 
any of the standard mixing conditions. The technique has been extended by Dehling and 
Durieu (2011), Durieu and Tusche (2012) and Dehling, Durieu, and Tusche (2012) to mul- 
tivariate empirical processes and to empirical processes indexed by classes of functions. 
Among the examples that could be treated by the new techniques are i3-geometrically er- 
godic Markov chains, for which the empirical process CLT could be established. It is the 
goal of the present paper to extend these techniques to the sequential empirical process. 

Sequential empirical process CLTs can be applied to the study of the asymptotic distri- 
bution of change-point tests based on the empirical distribution function. Suppose (Xj)j 6 N 
is a stochastic process with marginal distribution functions fi\, //2, Given the obser- 
vations X±,...,X n , we want to test the hypothesis Ho: "the process is stationary with 
marginal distribution fj," against the alternative H^: "there exists a k* £ {l,...,n — 1} 
such that (Xi,...,Xk*) and (Xk*+i, ■ ■ ■ , X n ) are both stationary with different marginal 
distributions". We propose the test statistic 



[nt\ 

U n {x, t) = -= (!{*<*} " F(x)) 



i=l 



E(K(x,s)K(y,t)) 



min(s, t)(F(min(x, y)) — F(x)F(y)). 



T n := max sup - ( 1 y/n\F k (x) - F k+l n (x 
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where denotes the empirical distribution function of the observations X\ , . . . , X^ and 
Fk+i^n denotes the empirical distribution function of X^+i, ■ ■ ■ ,X n (set Fq = F n+ i jU = 0). 
In order to determine the asymptotic distribution of T n , we study the x [0, l])-valued 

process R n = (Rn(x, i))( X) t) 6Rx [0,1] given by 

Rn{x,t) = <Jnt(l - t)(F [nt] (x) - F [nt]+1)Tl (x)) . 

As proved in the appendix (Theorem 7), assuming "convergence of the sequential empirical 
process" , we obtain under the null hypothesis Ho that 

R n ^ (K(x, t) - tK(x, 1)) M)6Kx [0jl] , 

where K is the centred Gaussian process with covariance structure 

Cav(K(x,t),K(y,8)) 

f OO OO N 

= min{s,f}j^Cov(l {Xo < x . } ,l {Xfc < ?/} ) +^Cov(l {Xo < J/} ,l {Xfe < x . } ) L. 

^k=0 k=l ' 

This process is also referred to as a Kiefer process. Applying the continuous mapping 
theorem to the supremum-functional, we obtain the asymptotic distribution of the test 
statistic T n under the null hypothesis, that is 

T n ~» sup \K(x,t) -tK(x, 1)|. 
xgr, te[o,i] 

Note that, in fact this result remains true for general J-"-indexed empirical processes, (see 
Proposition 5). 

The remainder of this paper is organized as follows: in Section 2 we present sequential 
empirical CLTs for ^-geometrically ergodic Markov chains (Theorem 1) and dynamical sys- 
tems that have a spectral gap on the transfer operator (Theorem 3). More abstract results, 
such as a sequential empirical CLT for multiple mixing random variables (Theorem 5), are 
stated in Section 3. These results will be the foundation for the later proof of the theorems 
from Section 2. The proofs of the aforementioned results can be found in Section 4, 5 and 6. 
The asymptotic distribution of the test statistic T n (Proposition 5) is given in the appendix. 

2 Sequential Empirical CLTs under Spectral Gap 
2.1 Definitions and Notations 

Let (X,A) be a measurable space. For a positive measure A on X and a A-integrable 
complex- valued function / on X, we will use the notation Xf := J x f dX. For s € [1, 00), we 
denote by L S (A) the Lebesgue space of s-th power integrable complex- valued functions on 
X. This space is equipped with the norm \\f\\ s = X{\f\ s ) 1/s . Further, we denote the space 
of essentially bounded measurable functions on X w.r.t. A by L°°(A) and the corresponding 
(essential) supremum norm by || • ||oo. Note that this norms depend heavily on the choice 
of the measure A, however throughout this paper it will always be clear which measure we 
refer to. 



3 



Let (Xj)j g N be an Af-valued stationary stochastic process with marginal distribution jjl 
and let J 7 be a class of real-valued measurable functions on X which is uniformly bounded 
w.r.t. the || • Hoc-norm. For n G N*, we define the map F n : F — > R, induced by the 
empirical measure, by 

1 n 

F n {f):=-Y j f{X i ), /GJ. 

i=l 

The sequential empirical process of the n-th order of (Xj)j e j$ is then the F x [0, l]-indexed 
process U n := (J7 n (/,<))(/,t)eFx[o,l] g iven b y 

r ,1 -, H 

U n (f,t) := ^(FhC/) -/Mf) = -= T(f(Xi) ~ (/,*) G J" x [0, 1], 

v/?7, -v/n — f 

8=1 

where [•] denotes the lower Gauss bracket, i.e. [x] := supjz G Z : z < x}. 

For fixed n G N*, we consider J7 n as a random element in the metric space £°°(J- x [0, 1]) 
of bounded real- valued functions on F x [0, 1], equipped with the supremum norm and the 
corresponding Borel cr-field. Since F x [0, 1] is uncountable, here we cannot assume that 
U n is measurable and thus standard techniques of weak convergence do not apply. We will 
therefore use the theory of outer probability and expectation (see van der Vaart and Wellner 
(1996)). 

Let E* X denote the outer expectation of a possibly non-measurable random element 
X, let U be measurable, and let U, Uo, U±, . . . take values in x [0,1]). We define 

convergence in distribution or weak convergence U n U in ^(Fx [0, 1]) as the convergence 
B*((p(U n )) -> B((p(U)) of all bounded and continuous functions ip : £°°(Fx [0, 1]) — ► R. We 
say that the process (Xj)j 6 N satisfies a sequential empirical CLT if the process U n converges 
in distribution in £°°(F x [0, 1]) to a tight centred Gaussian process. 

Empirical CLTs usually require some bound of the size of the indexing class F . This size is 
usually measured by counting certain sets, e.g. balls or brackets of a given || • || s -size, needed 
to cover F (c.f. Ossiander (1987) and van der Vaart and Wellner (1996, p. 83 ff.)). In our 
upcoming setting, we will only deal with properties for functions of a restricted class which 
could be disjoint of the class F . We thus need an adapted notion of bracketing numbers. 
This notion was introduced in Dehling, Durieu, and Tusche (2012). 

Definition. Let [X , A, //) be a probability space. For two functions I, u : X — > R such that 
l(x) < u{x) for all x £ X, we define the bracket 

[l,u] := {/ : X -»• R : l(x) < f(x) < u(x), for all x G X}. 

Let Q be a subset of a normed real vector space (C, || • ||c) of measurable real- valued functions 
on X. For given e > 0, A > 0, and s G [l,oo], we call [I, u] an (e, A, G, L s (/x))-bracket, if 
l,u G Q and 

\\u — l\\s < £ 

\\u\\ c <A, \\l\\ c <A. 
For a class of real- valued functions F on X, we define the bracketing number 

N(e,A,F,g,L s (fi)) 
as the smallest number of (e, A, Q, L s (/i))-brackets needed to cover F. 
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This notion of brackets allows to control the number of brackets needed to cover T not 
only with respect to the decreasing rate of the size of the brackets in I/-norm, but also with 
a control of the increasing rate of the || • ||c-size of the bracketing functions as the L s -norm 
goes to zero. 

2.2 £>-geometrically ergodic Markov chains 

In the following, let (Xj)j 6 ^ be a time homogeneous Markov chain on a measurable state 
space (X,A) with a probability transition P and an invariant measure v. We assume that 
the Markov chain starts with initial distribution v, i.e that the distribution of Xq is v. This 
makes (Xj)j £ pj a stationary sequence. We also denote by P the associated Markov operator 
defined by 



We assume that there exists a complex Banach space (B, \\ ■ of measurable functions 
from X to C such that P is a bounded linear operator on B. We denote by C(B) the space 
of bounded linear operator from B to B. We will need the following properties of the space 



(A) l x G B, |/| and f € B for all / G B, and the Dirac measures 5 X are continuous on B. 
Moreover for some m G [1, oo], 

(B) B is continuously included in L m {v), i.e. there is a K > such that || • || m < K\\ • ||g. 

Further we consider processes such that the action of the corresponding Markov operator 
on B satisfies 

(C) ||P n / - {yf) l x ||s < K\\f\\ B 6 n for some K > 0, 9 G [0, 1), and all / G B. 

This property is often referred to as strong or geometric ergodicity with respect to B (c.f. 
Meyn and Tweedie (1993), Herve (2008), and Herve and Pene (2010)). 

Remark 1. Note that condition (C) corresponds to a spectral gap property of P acting on 
£> , i.e. 1 is the only eigenvalue of modulus one, it is simple, and the rest of the spectrum is 
contained in a disk of radius strictly smaller than one. Further, in this case there exists a 
decomposition of the linear operator P in £(£>), 

p = n + N, 

such that 11/ = {vf) lx is a projection on the eigenspace ofl, NoH = HoN = 0, and 
p(N) := lim.n_j.oo ll-^ n ||/;(s) < 1) where || • \\c(b) denotes the operator norm on B. 

For a function / from X to R, using Fourier kernels, we introduce the perturbed operators 



In order to apply the Nagaev method (c.f. Hennion and Herve (2001)), we also need the 
following condition for some real vector space C of functions from X to M, which will be 
specified later in the applications. 




B: 



(Pf,t<p)(x) = P(e u ^){x) = [ e 



Jx 



itf ^<p(y)P(x,dy), ten. 
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(D) For all / EC, for t in a neighbourhood of we have that Pf^ G C(B) and further that 
1 1 — y Pf t is two times continuous differentiable with derivative given by 

We will see that the conditions (A) - (D) guarantee a sequential finite dimensional CLT 
for functions in C (see Proposition 4 in Section 6). Remark that, in application, C will be 
chosen as a subset of B. 

Now to establish a tightness property of the empirical process, the following further 
condition on the space B is useful. 

(E) There exist C > and £ E N* such that, if / G B and g E B are bounded by 1, then 
fg G B and \\fg\\ B < Cmax{||/|| B , \\g\\ B } e . 

Note that if B is a Banach algebra, condition (E) holds with I = 2. If further C is a subset of 
,6, then for every / G C, the mapping 1 1— > Pjt is an entire function and therefore condition 
(D) is also satisfied. 

To derive a CLT for a ^-"-indexed empirical process, we now have to precise the relation 
between the class T and the Banach space B or, more precisely, between T and the vector 
space C which satisfies condition (D). Note that, in the particular case where J- is a subset 
of C, from (A) — (D) we can infer the finite dimensional convergence of the process (?7n)neN- 
Then, the tightness can be established under an entropy condition on T that uses the 
usual bracketing number defined as in Ossiander (1987). Nevertheless, in many examples, 
the functions of T do not belong to the space B. To overcome this difficulty, we have to 
measure how the functions of T are well approximated by the functions of B. We will use 
the bracketing numbers introduced in the preceding section to obtain a control on the size of 
T which depends on the possibility of approximation by the space B. Since J- is composed 
of real-valued function, we concentrate on real-valued function on the space B. We denote 
by £>r the subset of B composed by real-valued function. Note that (5k, || • is a real 
Banach space. Our conditions on the Markov chain (in particular condition (C)) enable 
us to deal with bracketing numbers allowing an exponential growth of the ,6-norm of the 
bracket functions as the || • || s -size of the bracket goes to zero. This leads the following 
entropy condition. 

For some s G [1, oo] and Q C Br, 

(F) there exist C > 0, r > — 1, and 7 > 1 such that 

/ e r sup N 2 (5, exp(C<5 7 ) , T, Q, \j s (v))de < 00 (1) 

Jo £<<5<1 

Remark 2. Observe that for r' > 0, inequality (1) holds for all r > 2r' — 1, if 
N(e,exp(Ce~^),T,g,L s (fi)) = 0{e~ r ') as e -> 0. 

Note further, that the supremum appears to deal with the possible non-monotonicity of the 
bracketing number. 
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We can now state the sequential empirical central limit theorem, which is proved in 
Section 6. 

Theorem 1 (Sequential empirical CLT for ^-geometrically ergodic Markov chains). Let J- 
be 

& II ' ||oo -bounded class of functions from X to KL Assume thai for some m £ [l,oo], the 
conditions (A), (B), (C), and (E) hold. If there is a || • | (oo -bounded subset Q C Sjj such that 
(D) is satisfied for C = Vect^(Q), the smallest real vector space containing Q , and if (F) is 
satisfied with s = m/(m — 1), then the sequential empirical process converges in distribution 
in £°°{J- x [0, 1]) to a centred Gaussian process K with covariance structure given by 

Cav{K(h,h),K{f2,t2j) 

f OO OO n 

= min{t 1 ,t 2 }|^Cov(/ 1 (Xo),/ 2 (X fc )) +^Cov(/ 1 (X fc ),/ 2 (X )) [. (2) 

^fc=0 k=l ' 

Remark 3. A centred Gaussian process K with covariance structure (2) is often referred to 
as a Kiefer process. 

Now, let us give an example by applying Theorem 1 to random iterative Lipschitz models. 
2.3 Iterative Lipschitz models that contract on average 

In this section, we assume that (X, d) is a (not necessarily compact) metric space in which 
every closed ball is compact. Further we assume, that X is equipped with the Borel a- 
algebra ^B{X). Let {Tj, i > 0} be a family of Lipschitz maps from X to X. We consider 
the Markov chain with state space X and transition probability P given by 

P(x,A) = Y,Pi(x)lA{Ti(x)), x € X, A € »(*), 

i>0 

where the pi are Lipschitz functions from X to [0,1] which satisfy ^2i>oPi(x) = 1 for all 
x £ X. Thus, each step of the Markov chain corresponds to the application of one of the 
maps Tj which is chosen randomly with respect to a probability distribution which depends 
on the actual state of the chain. We assume that this model has a property of contraction 
in average, that is that there exists a p £ (0, 1) such that 

^2d{Ti(x),Ti(y))pi(z) < pd(x,y), \/x,y,zeX. (3) 

Statistical properties of such models have been studied by Dubins and Freedman (1966), 
Barnsley and Elton (1988), Hennion and Herve (2001), Wu and Shao (2004), Herve (2008), 
and by Herve and Pene (2010) in the case of constant functions pi and by Doeblin and Fortet 
(1937), Karlin (1953), Barnsley, Demko, Elton, and Geronimo (1988), Peigne (1993), 
Pollicott (2001), and by Walkden (2007) in the case of variable functions pi. 

As in many of the cited papers, we need the following technical properties. For some fixed 
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xq € X, suppose 



SU P 777 ^ Pi(x) <oo, (4) 

x,y,zex,f^ d{y,z) 

d(Ti(y),x 

sup > 

x,yex~^ 1 + d{y,x Q 

d(Ti(x),x ) _ \pi(y) - pi(z)\ 

i>0 



SUp > - SUp r < OO. (6) 

xex ~ 1 + d{x, x Q ) y ,zGX,y^z d{y, z) 



Moreover assume that for all x,y € X, there exist sequences of integer (i n ) n >i and (j n )n>i 
such that 

d(T in o ... o T h (x) , T jn o ... o T h (y)) (l + d(T jn o ...o T h (x) , x )) ^ as n ->• oo (7) 

with Pin (Tj^j o . . . o T h (x)) ■ . . . ■ p h (x) > and p jn (T Jn _ 1 o . . . o T h {y)) ■ . . . ■ Pjl (x) > 0. 
Note that conditions (4) - (6) are verified when the family of maps Tj is finite and (7) is 
verified when (3) - (6) hold and each pi is positive. See Peigne (1993) for a discussion on 
these assumptions. 

Under the conditions (3) - (7), Peigne (1993) proved that the Markov chain has an 
attractive P-invariant probability measure v with existing first moment. We define the 
stationary process (Xj)j>o on X as the Markov chain started with distribution that is 
Xq ~ v . 

A central limit theorem for the empirical process associated to the Markov chain (Xj)j>o 
was proved by Durieu (2013) (see also Wu and Shao (2004) in the case of constant functions 
Pi). The following theorem extends this result to the sequential empirical processes. 

For a £ (0, 1] and IK = C or K = M, we consider the space H a (X, K) of bounded a-H61der 
continuous functions on X with values in K, equipped with the norm 

II ' \\a • — || ' ||oo ~i~ Tla(')> 

where 

m \f(x)-m\ 

matt) '■= SU P u s • 

x,yex d{x,y) a 

x^y 

Theorem 2. Let (3) - (7) hold and consider a \\ ■ W^-bounded class of functions T . Let 
s £ (1,2) and Q be 

Q II * ||oo~^^^^^ subset of the space 7^a(Af,lR) fov some ct <c ~r~ such 
that (F) holds. Then the ^-indexed sequential empirical process (U n (f, i))j- x [o.i] associated 
to the process (Xj)j>o converges in distribution in the space [0, 1]) to a centred Kiefer 

process with covariance given by (2). 

Proof. First, we introduce spaces of Lipschitz functions with weights that give the geometric 
ergodicity of the chain. For every a, /3 G [0, 1], let HafiiX, C) denote the space of continuous 
function from X to C with ||/|| a ,^ = Np{f) + m a ^(f) < oo, where 

at m 1/0*01 , m \f(x)-f(y)\ 

N ^ f) = s xlli + d { x^y and m ^ f) = x jz* y d( X ,yni+d(x,x n 
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In particular, the space H a (X, C) := T-L a fi(X, C) is the space of bounded a-H61der functions 
from X to C and we have || • || Qj o = || ■ ||«- It is a subspace of % a ^(X, C) for all j3 > 0. The 
following properties are straightforward and given without proofs. 

Lemma 1. For all a and f3 G [0, 1], 

(i) the space CH a p(X,C), \\ ■ \\ a p) is a Banach space which satisfies condition (A), 
(ii) for every bounded f,g G l-L a> p(X,C), we have that 

ll/slk/3 < ||/|U||5lU,/3 + ll9l|oo||/|U,/3, 

(Hi) for every f G H a (X,C) and g G H a> p(X,C), we have that \\fg\\ a ,p < \\f\\ a \\g\\ a ,i3, 

(iv) there exists a C > 0, for every f G V. a ^(X,C), < CNp(f)? . 

Therefore condition (B) holds with m = 1//3 as a consequence of (iv) and condition (E) 
is satisfied due to (ii). Now, according to Theorem 1 in Peigne (1993), we obtain for all 
a, P G (0, 1/2) with a < (3 that P is a bounded linear operator on rl a ^{X, C) which satisfies 
condition (C). 

It remains to verify condition (D). We consider the space 7i a (X,M) of bounded real- 
valued a-H61der functions on X . Let / be a function of this space and consider the perturbed 
operator defined by Pf,tf = P{e %t * <p)- Using \e m — e lb \ < \a — b\, we get that e %t f ^T-L a (X,C) 
for all t G R. Thus, for every ip G H a ^(X ,C) and t G K, by condition (iii) of Lemma 1, 
we have e itf <p£ H a ,/3{X,C)- 

Since P G C(J-L a ^{X , C)), we infer that Pfj G C(T-L a fi(X , C)) 
for all t £ 1. Further, using again condition (iii) of Lemma 1, we see that t >— > Pfj is an 
analytic function from R to C(H a ,p{X , C)), given by P fjt (p = Y.k>o P{(if) k p)t k /k\. We 
infer that (D) holds over the space Ti a (X,W). 

We now apply Theorem 1. Let s, a, and Q be as in the statement of Theorem 2. By 
choosing (3 = (s — l)/s < ^, we have a < (3 and thus (A) - (E) hold for the space 
B = 'H i,p{X,C) with m = 1//3 and C = H a (X,M.). Further, for any g G Q, we have 
g G Tl a fi(X ,C) and HffHa^ < \\g\\ a - Therefore, condition (F) is also satisfies with respect to 
the n a ^(X,C)-noTm. □ 

2.4 Dynamical Systems with a Spectral Gap 

Let us mention that, as usual, the proof of Theorem 1 can be adapted to deal with dynamical 
systems using the Perron-Frobenius operator in place of the Markov operator. Let (X, A, n) 
be a probability space and let T be a measure preserving transformation of X, that is 
[i{T~ l A) = n{A) for all A G A. The Perron-Frobenius operator P is defined on L 1 ^) by 
the equation 

M/ • Pg) =t*(f°T- 9), V/ G L°°(n),g G L 1 ^). 

Further, for a function / on X, we define the perturbed operator by PffiP = P(e it -'(p). We 
have the following result, for which the proof follows the one of Theorem 1 and is left to the 
reader. 
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Theorem 3 (Sequential empirical CLT for dynamical systems with a spectral gap). Let 
T be 

^ II * 1 1 oo " bounded class of functions from X to IR. Assume that there exist a Banach 
space B and a real number m > 1 such that the conditions (A), (B), (C), and (E) hold 
with respect to the Perron-Frobenius operator and replacing v by \i. If there exists a || • ||oo- 
bounded subset Q C B^ such that (D) holds for the space C = VectR(£7) and (F) holds for 



s = ^ri, then the process (U n (f,t))j7 X [ 0A ], defined by U n (f,t) = A^YXX [f oT% ~ l*f)> 



converges in distribution in £°°(J- x [0, 1]) to a centred Gaussian process K with covariance 
structure given by 



As a possible application, we can extend the empirical CLT proved by Collet, Martinez, 
and Schmidt (2004) for a class of expanding maps of the interval, to a sequential empirical 
CLT. In the situation considered in Collet, Martinez, and Schmitt (2004), the spectral gap 
property can be established on the space of bounded variation functions. Gouezel (2009) 
gave examples of expanding maps of the interval for which the Perron-Frobenius operator 
does not act on the space of bounded variation functions, but acts on the space of Lipschitz 
functions with a spectral gap property. These examples also satisfy the assumptions of our 
theorem and thus sequential empirical CLTs can be proved. Note that the space of Lipschitz 
functions is a Banach algebra and thus conditions (D) and (E) are trivially satisfied. Further, 
the usual class of the indicator functions of intervals can be well approximated by Lipschitz 
functions, and the condition (F) is verified for this class, see also Section 2.5. 

2.5 Indexing Classes of Functions 

To conclude this section, we present some classes of functions for which an estimate of the 
bracketing number can be computed. These examples, which satisfy condition (F), come 
from the paper by Dehling, Durieu, and Tusche (2012). 

For vectors x = (xi, . . . ,xa), y = (yi, ■ ■ ■ ,yd) G K d write x < y if Xi < yi for all i G 
{1, . . . ,d}. Further, denote the modulus of continuity of a real-valued function F defined 
on a subset of R d by wf- Recall that wp(t) := sup{|F(x) — F(y)\ : \x — y\ < t}, where | • | 
denotes the corresponding Euclidean norm. 

Proposition 1. For a metric space X , equipped with a probability measure fi, set B = 
T~l a (X,M.) and Q = {/ G B : < / < 1}. We have the following statements about our 
entropy condition (F). 

(i) Let X = M. d , T := {l(t )U ] -t,u£ W d , t < u}, and let F denote the distribution function 
of n. If there are some s E [l,oo] and 7 > 1 such that wf(x) = 0(\ log(x)| _S7 ) as 
x — > 00, then condition (F) is satisfied. 

(ii) Let X = R d and T := {l E ( x ,r) ■ x G [0, l] d ,r £ [0,D] d } for any fixed D > 0, where 
E(x,r) denotes the ellipsoid E(x,r) := {y 6 M. d : Yli=i( x i ~ Hi) 2 / 1 "! ^ !}• V I 1 has 
a bounded density w.r.t. the Lebesgue measure, then condition (F) is satisfied for all 




mm 




s G [l,oo]. 
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(Hi) In the situation of (ii), J- can be replaced by {T-E(x,r) '■ x £ R d ,r G [0,-D] d } ; if one 
furthermore assumes that fi({x G R d : |x| > i}) = 0{t~ p ) as t -> oo /or some 

pe (0,1). 

(raj For an arbitrary metric space (X,p) and J- = {B(t) : i > 0} ; where B(t) := {x £ X : 
p(xq,x)} for some fixed xq G Af, Ze£ G := /x(B(id K >o)). T/ien condition (F) ZioWs z/ 
i/iere are some s G [1, oo] and 7 > 1 such that wc(x) = 0(\ log(x)| _S7 ) as x — > 00. 

(v) Let X = R, J 7 = {/< : 4 G [0, 1]} ; where the ft are functions from R to R which satisfy 
(i) < / t (x) < 1 for all t G [0, 1] and x G R, 
/, < ft for allQ<s<t< I, 
(Hi) ft is monotone increasing on R /or a// 1 G [0, 1], and 
(iv) Gn(t) = pft is Lipschitz, 

Further, let F denote the distribution function of p. If there are some s G [l,oo] and 
7 > 1 such that wf(x) = 0(\ log(x)|~ S7 ) as x — > 00, then condition (F) holds. 



3 Sequential Empirical CLTs for Multiple Mixing Processes 

In this section, we present a more general result which can be applied in the setting of 
Section 2. In particular, the approach used here is useful when the indexing class J- is 
disjoint from the space of functions on which we have good properties. In the more abstract 
setting, our technique requires two basic assumptions concerning the process (f(Xi))i £ ^, 
where / : X — > R belongs to some normed vector space (C, \\ ■ \\c) of functionals on X. We 
assume that for some || • ||oo-bounded subset Q C C the following two properties hold. 

Assumption 1 (Finite dimensional sequential CLT for C?-observables). For every choice of 
fl, ■ ■ ■ , fk G Q and ti, . . . , t k G [0, 1] 



[nti] [nt 



^ E(/iW-/*/i)> ■■■ , J2(f k (Xi)-M 



JV(0,E) 



where N(0, S) denotes some A:-dimensional normal distribution with mean zero and covari- 
ance matrix S = (^ij)i<ij<k- 

Assumption 2 (Moment bounds for C?-observables). For fixed p G N*, s > 1, and monotone 
increasing functions <J>i , . . . , : R + — > R + , we consider the 2p-th moment bound 



E 




£(/(**)-/*/)) <E ni H/ll^(ll/llc) far aU / 6 (8) 
,i=l 

With these assumptions we can show the following abstract sequential empirical CLT. 

Theorem 4. Let (X,A) be a measurable space, let (JQ)jgN ^ e a X -valued stationary process 
with marginal distribution p, and let T be a uniformly bounded class of measurable functions 
on X. Suppose that for some normed vector space C of measurable functions on X, some 
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subset QofC which is bounded 

^ II * ||oo"^^^^? V ^ , s ^ 1 and some monotone increasing 
functions : K+ — > R+, Assumption 1 and Assumption 2 hold. Moreover, as- 

sume that there exist a constant r > — 1 and a monotone increasing function ^> : M + — >• M + 

/ e r sup N 2 (5,V (tT 1 ) , F,g,h s {ii))de < 00. (9) 

Jo £<<5<1 

If 

*i(2*(x)) = O(^), (10) 

/or some non-negative constants 7$ suc/i i/iai 

7i <2p-(i + r + 2), (11) 

i/ien i/ie sequential empirical process U n converges in distribution in £°°(J- x [0, 1]) io a tap/it 
Gaussian process K. 

The proof can be found in Section 4. 

Remark 4. Note that the entropy bounds presented in Proposition 1 are strong versions of 
entropy conditions of the type in Theorem 4. 

In the general setting of Theorem 4, we cannot precise the covariance structure of the 
limit process. The next lemma shows that under additional conditions, the limit process of 
U n is indeed a Kiefer process (c.f. Remark 3). 

Lemma 2. In the situation of Theorem 4, assume that 

(i) Assumption 1 holds with covariance matrix £ given by 

f OO OO N 

Eij = mm{ti,tj} \ £Cov(/i(X ), /;(**)) + Y f Cav(f j (X Q ),fi(X k )) L 

^fc=0 k=l ' 

(ii) there is a function O : N — > R+ and a constant f3 > 1 satisfying 

00 

Y^^ik 13 )®^) < 00 
k=l 

such that for all f G Q U (Q — Q) and all <p G L s (^x) 

\Cov(ip(X )J(X k )) \ < M\ s \\f\\ c e(k). 

Then the covariance structure of the limit process K is given by (2). 
The proof is given in Section 5. 

Note that Proposition 4 in Section 6 shows that Assumption 1 can be established in the 
setting of Section 2.2. Assumption 2 has been verified for p = 2, &i(x) = log 3 (x + 1), 
and $2(2^) = log 2 (2; + 1) by Durieu (2008), who considers Markov chains and dynamical 
systems that support a spectral gap property. In a later work, Dehling and Durieu (2011) 
generalized this result to general p G N* and &i(x) = log 2p ~ l (x + 1). More general, they 
show that for a process which satisfies the so called multiple mixing condition w.r.t. C, for 
every || • H^-bounded Q C C and every p G N*, there is a c > such that Assumption 2 holds 
with = clog 2p+{do ~ l)i (x + 1). The multiple mixing condition is defined as follows. 
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Definition (Multiple Mixing Processes). We say that a process is multiple mixing 

with respect to C if there exist a 9 £ (0, 1) and an integer do £ N such that for all pGN*, 
there exist an integer I and a multivariate polynomial P of total degree not larger than do 
such that 

|Cov(/(X i0 ) • • • fiX^), f(X lq ) ■ ■ ■ f(X lp ))\ < \\f\\s\\f\\ e c P(h -io,...,i P - i P -i)0 1 "-^ 

(12) 

holds for all f £ C with fj,f = and ||/||oo < 1, ah integers io < ii < • • • < i p and all 
9 G {!,-•• ,£>}■ 

Note that in the setting of Section 2.2, this property with do = can be derived from the 
spectral gap property, see Lemma 5. For multiple mixing processes, we have the following 
version of Theorem 4. 

Theorem 5 (Sequential empirical CLT for multiple mixing random variables). Let (X, A) be 

a measurable space, let (Xi)i^ be a X-valued stationary process with marginal distribution 
fjL, and let T be a uniformly bounded class of measurable functions on X . Suppose that 
for some s > 1, the process (Aj) n6 N is multiple mixing w.r.t. a normed vector space C 
of measurable functions on X , where for every p £ N* the multivariate polynomial P in 
inequality (12) is of total degree not larger than do- If further Assumption 1 and condition 
(F) hold for some || • ||oo -bounded subset Q of C and 7 > do + 1, then the sequential empirical 
process U n converges in distribution in 1°°^ x [0, 1]) to a tight Gaussian process K . 
If further the covariance matrix E in Assumption 1 is given by 

s OO OO N 

=mm{t i ,t j }lY J Cov(f t (X ),f j (X k )) + ]T Cov^Xq), / 4 (X fc )) k 

^fc=0 k=l ' 

and if there are constants 8 € (0, 1) and D > such that for all f £ Q U (Q — Q) and all 
if £ L s (/i) 

\Cov( v (X ),f(X k ))\<D\\ i p\\ s \\f\\ c e k , 
then the covariance structure of the limit process K is given by (2). 

Proof. As aforementioned, multiple mixing processes satisfy Assumption 2 with &i(x) = 
clog 2p+ ^ 0_1 ^(a; + 1) for some c > depending only on p. Thus choosing ^ := exp(Cid 1 ^ 7 ) 
for some C > and 7 > 1 (which gives a quite relaxed entropy condition concerning the 
|| • ||c-size), we have $j(2^(x)) = 0(x ( - 2p+( - d °~ 1 ^/"'). Therefore condition (11) holds for 
sufficiently large p £ N* if 7 > do + 1. The covariance structure of the limit process is a 
direct consequence of Lemma 2 with Q(k) = 9 k and f3 £ (1,7)- □ 

Multiple Mixing of Lower Rate Processes of a lower mixing rate have been studied 
by Durieu and Tusche (2012), who consider a multiple mixing condition on C — T~L a (M. ,M), 
a £ (0,1], where 9 lq ~ l i~ 1 in (12) is replaced by a term Q(i q — i q -\) with a monotone 
decreasing function : N — > M+ such that Xl£o i 2p ~ 2 @(i) < 00 and do = 0. In this case 
they were able to prove the validity of Assumption 2 with <Fj = c id' , c > 0, see Proposition 1 
in Durieu and Tusche (2012). 

Such a mixing type is given e.g. for multidimensional causal functions of i.i.d. processes. 
A causal function of an i.i.d. process (^)iez is defined as a process (Xi)i £ ^* given by Xi = 
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G(£i, ■ ■ • ), where G : X^ — > R rf is a measurable function. The physical measure of 
dependence Si im (introduced by Wu (2005)) is defined by 

<5i, m := E(|G(£,&_i,... )-<?(&,&-!,..., Su&,£i, ■■ OD™. 

where (£-)iez is an independent copy of (^)iez- Durieu and Tusche (2012) showed that a 
causal function of an i.i.d. process has the aforementioned mixing property with Q(i) = 5f m , 
where m = s/{s — 1). 

As an example consider the following MA-process. Let (£0«ez be an i.i.d. process in a 
normed vector space (y, \\ ■ \\y), let (oj)j 6 n be a family of Revalued linear functional on y, 
|a|* := sup{|a(y)| : \\y\\y < 1}, and define the process pTj)j 6 N by Xi = X^=i a j£i-j- ^ n this 
case we have <5j jm < (2||Xo|| s ) a YlJLi \ a j\* an d thus, assuming that ||Xo|| s and Yl'jLt \ a j\* arc 
finite, (Xi)igR has the multiple mixing property with 6(z) = Yl'jLi \ a j\*- 

Note that as a consequence of working with <3?j = c id* in this setting, in order to satisfy 
condition (10) in Theorem 4, we cannot choose ^ = exp(C id 1 / 7 ), but need to work with 
a polynomial type of VP, which on the other hand requires stronger types of bracketing 
numbers then in condition (F). These can be achieved mainly using stronger conditions on 
fj, or restrictions on T . Observe that the bracketing numbers presented in (ii) and (iii) of 
Proposition 1 are actually also available for polynomial (c.f. Dehling, Durieu, and Tusche 
(2012)) without further restrictions of fx or T . Further, bracketing numbers for indicators of 
semifinite rectangles of the type [— oo,i], t G R rf , with polynomial VP are implicitly given in 
Durieu and Tusche (2012) at the cost of stronger assumption on the marginal distribution 

4 Proof of Theorem 4 

The main idea of the proof is to introduce some approximation Un^ for the original process 
U n , which is based on functions in Q and thus can be controlled by Assumption 1 and 2. 
The approximation can be constructed as follows: For all q > 1, there exist two sets of 
N q := N(2-«,*(2?),?,g,L a (ji)) functions {g q ,i, . . . , g q , Nq } C Q and {g' q ^ . . . ,cf qNq } C G, 
such that 

\\g q ,i - g' q ,i\\ s < 2~«, \\g q ,i\\c < *(2«), 114,11c < *(2 9 ) (13) 

and for all / G J 7 , there exists some i such that g q ^ < / < g qi . Further, by (9), 

Y j 2- { - r+1)q N q 2 < oo. (14) 

<?>! 

To approximate the indexing function / G J 7 , construct a partition of J- into N q subsets 
T qj i such that for each / G T q ^ one has g q> i < f < g qi . We use the notation ir q f = g q ^* and 
ir' q f = g' qi *, where i* is the uniquely defined integer such that / G F q .i*. To approximate 
the time parameter we use the partition of [0, 1] into subsets T q j, j = 1 . . . ,2 q , given by 
T qd := [(j - l)2~ q ,j2~ q ) for j < 2 q and T q ,2» := [1 - 2 _ «, 1]. For i G [0,1] we define 
T q t : = max{(j - 1)2~« < t : j = 1, . . . , 2 q } and further := r q t + 2~ 9 . We extend the 
notation introduced in Section 2 to arbitrary //-integrable functions / : X — > R by setting 

1 n 

i=i 
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and for t € [0, 1] 

[nt] 



For each q > 1, we introduce the approximating process 

[nr 9 t] 

E#>(/,t) := U n (TT q f,T q t) = -= £ (7T,/(Xi) - . 

vn i=l 

Note that these process is constant on each T q j x 7^j. 

To draw the connection between the weak asymptotic behaviour of the original process U n 
and the approximating process Un \ we use an altered version of Theorem 4.2 in Billingsley 
(1968, p.25): 

Theorem 6. Let X n ,Xn\x^ q \ q,n > 1 be random elements with values in the Banach 
space (£°°(F x [0,1]), || • ||oo) and suppose that X( q ' is measurable and separable 1 . If the 
conditions 

X^ as n — > oo for all q > 1, (15) 

lirnsupP*(||X n - ||oo > (J) — ► as g ^ oo /or all 5 > (16) 

are satisfied, then there exists an £°°(F x [0,1]) -valued, separable random variable X such 
that X as q —> oo and 

X n ~» X as n — > oo. 

For the proof and further details see Theorem 2.1 in Dehling, Durieu, and Tusche (2012). 

We will prove Theorem 4 by establishing condition (15) and (16) in the two following 
propositions: 

Proposition 2. For all q € N* the process (Un\f, i))(/ ) t)ej^x[o,l] converges in distribution 
to a piecewise constant Gaussian process (U( q \f, t))(/,t)eJ r x[o,i] as n ^ oo. 

Proposition 3. Assume that Assumption 2 holds for some p£N*,s>l and some mono- 
tone increasing functions $i, . . . , $ p : — > M+. Moreover, suppose there exists a constant 
r > — 1 and an monotone increasing function : M+ — >■ such that (9) holds. If (10) 
/jo/ds /or some non-negative constants 7$ satisfying (11), i/ien /or a// e,r/ > i/iere exists 
some qo such that for all q > qo 



limsupP* sup sup 

n-*x> \te[o,i]/eJ 



U n (f,t)-U®(f,t) 



> e < r). 



1 Since the objects we work with involve suprema over the non separable space £°°(F x [0, 1]), measurability 
can not always be guaranteed. Thus we need to use the theory of outer probability as presented in 
van der Vaart and Wellner (1996). In this context we call any (not necessarily measurable) functions on 
a probability space a random element and we call it a random variable if it is also measurable. We denote 
the outer probability with respect to a probability measure P by P* . Furthermore a random variable X 
with values in some space S is called separable, if there exists some separable subset S' of S such that 
such that P(X € 5") = 1. 
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Proof of Theorem 4- We can now apply Theorem 6 with X n = U n , = ui Q \ X® = 

U^ q \ By Proposition 2 the convergence (15) holds, while (16) is satisfied due to Proposition 3. 
Therefore U n converges in distribution to an £°°(Fx [0, l])-valued, separable random variable 
W. Furthermore, we know that U^ q ' is a piecewise constant Gaussian process which con- 
verges in distribution to K. Thus K is Gaussian, too. Since £°°(F x [0, 1]) is complete, the 
tightness of K follows from the separability (c.f. Lemma 1.3.2 in van der Vaart and Wellner 
(1996)). □ 

Proof of Proposition 2. Since by construction ir q f G Q for all / G F, due to Assumption 1, 
the finite dimensional process {Un\fi,ti), • ■ ■ , Un\fk,tk)) converges in distribution to some 
multi-dimensional normal distributed random variable (i7^(/i,ii), • • • , U n (fk,t k )) for all 
fixed k G N*, fi,...,fk G F, h,...,t k G [0,1]. All Un \ n G N*, are constant on each 
Fq,i x T q ,j, i = 1, • • • , N q , j = 1, . . . , 2 q . Therefore is constant on all Fgi x Tqj, too. 



Since these sets form a partition of F x [0,1], the finite dimensional convergence yields the 



convergence in distribution of the whole process (U n q \j ',£))(/,<) 



eJ"x[o,i]- 



□ 



Proof of Proposition 3. Let Z := Z — E Z denote the centring of a random variable Z and 
observe that for any random variables Y\ < Y < Y u the inequality 

w-n < ^-n+^Yu-n 

holds. Since for / G F, k G N we have F [nt] (ir q+k f,t) < F [nt] (f,t) < F [nt] (ir' q+k f,t), using 
that || • ||i < || • \\ s for s > 1 and applying (13), we obtain 



\U n (f,t)-U n (ir q+k f,t)\ 



[nt] 



< \U n (ir' +k f,t) - U n (ir q+k f,t)\ + —= E|F [nt ](7r' +fc /-7r g+fc /)| 

V n 

< \Un^' q+k f,t) - U n (ir q+k f,t)\ + ^ {q+k) - 



(17) 



Moreover, for all n > 2 q+k and g £ G 



\U n (g,t) - U n {g,T q+k t)\ = —j= 



[nt] 

< 2Mn~5([nt] - [nT q+k t]) 

< AM^2~^ +k \ 



(18) 



where M := sup{||p||oo : g G £?} is finite by assumption. Analogously to the processes Un ^ , 
we introduce the processes Un 

given by 

U$\f,t) :=U n (ir q f,r' q t). 
An application of the triangle inequality, (17), and (18) yields 

U n (f,t)-Ui q+k \f,t) < U^+ k \f,t)-U( q+k \f,t) +(4M + l)v^2^+ fe . (19) 
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Combining (19) with a telescopic sum argument, one obtains for any K > 1 
U n (f,t)-U®(f,t) 

■ K 

£ ui q+k \f,t) - \ + U n (f,t) - U^ +K \f,t 

■k=l J 

<{Y,\ U n +k) (f^- Ui n +k ~ 1) (Lt)\} 

^k=l J 
+ (4M + l)V^2~ {q+K) ■ 

To assure e/4 < (4M + i) y /n2-to +IC > < £ /2, choose K = K n>q , given by 

/4(4M + l)y/E 



U^+ K \f } t)-U£+ K \f,t) 



(20) 



K, 



n,q ■- 



log 2 



I" 



2ie 



For each i = 1, . . . , N q , j = 1, . . . , 2 q , inequality (20) implies 



^^■Tq,j f^J~i q,i 



sup sup \U n (f,t)-U®(f,t)\< \Y, sup sup Uj i * k Xf,t)-U}? +k ~ r >(f,t) 



1 teT^ /e^ 9 
+ sup sup 

^^■Tq,j f^3~ q.i 



+ 



Set e fc = e/(4k(k + 1)). Then e fc = e/4 and for alH = 1, . . . , N q we have 
P* ( sup sup \U n (f,t) - U$ (/, t) | > e ) 



< 



. fe=i 



sup sup 

^^Tq,3 f£-^~ q,i 



+ P* ( sup sup 



u}? +k Hf,t)-u!« +k - 1 \f,t) 



>£k 



>£ 

- 4 



(21) 



Recall that (ir q+ k,T q+ k) and thus Un 9+k ^ and Un q+k ^ are constant on each T q+ k,i X T q +k,j, 
i = 1, . . . 7V g+fe , j = 1, . . . , 29 +fc , and thus the suprema on the r.h.s. of inequality (21) are 
in fact maxima over finite numbers of functions. Therefore the outer probabilities may be 
replaced by usual probabilities here. Now, for each k G N*, choose a set F{k) of at most 
Nk_\Nk functions in J 7 , such that J-(k) contains at least one function in each non empty 
.7^0 .Fjfc-i.i', i = 1,. . . ,N k , i' = l,...,N k -x. For q E N* and i G {1, . . . ,iVg}, define 

^fe, g ,i :=J c qji nJ c (q + k) 

T k>qjj := {(j - l)2-i + (m _ 1)2 -(9+fc) : m G {1, . . . , 2 fe }}. 
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Inequality (21) and the triangle inequality imply 

P*( sup sup \U n (f,t) - U®(f,t)\ > e) 
\teTa.i feFa.i J 



'9>J f^-Fq,* 



< I ^2 ^2 U n(ir q +kf,T g+k -it) - U n (ir q+ k-if,Tq +k -it) 

I fe=l t£T ktq j f£F ktqd 



~ 2 



P(^U n (7T q+k f, T g+k t) - U n (7T q+k f, T q+k ^it) 



> 



t£T K . 



> 



q,q,l 



+ P{\U n (ir q+Kn J,T q+Kn J) - U n (ir' q+Kn J,T q+Kn j) 
Applying Markov's inequality on the 2p-th moments, we obtain 

P*( sup sup \U n (f,t) - U<?)(f,t)\ > e) 



> 



< i Y (y) P (' E \U n (TT q+k f,T q+k ^it) - U n (ir q+k -lf,T q+ k-it)\ 

{ fc=l teT kiq>j f£F k , q>i 



2p 



+ E|C/ n (7T g+fc /, T q+k t) - U n (lT q+k f, T q+k _it 



\2p 



+ E E 



B\U n {7r q+Kn J,T q+Kn j) - U n (ir q+Knq f,T q+Kn j] 



|2p 



n,q,q,3 



f^ F K n ,q,q,: 



+ B\U n (7T q+Knq f,T q+Kn j) - U n (TT q+Kn>q f,T q+ K n j] 



2p 



(22) 



We will treat the expected values on the r.h.s. of inequality (22) separately now by using 
Assumption 2 and properties of our brackets used to cover T . Recall that by (13) we have 



\\7T q+k f - TTg+k^fl < \\7T q+k f - f\\ s + K +fc _l/ - /||, < 3 • 

h q+k f-n' q+k f\\ s <2-^ 

\7T q+k f - K q+k ^f\\ C < 2^{2« +k ) 

\\7r q+k f-K q+k f\\ c <2*(2«+ k ). 



(23) 
(24) 



For convenience, throughout the rest of the proof will write x <C y if there is some finite 
constant C 6 (0, oo) such that x < Cy, where C may only depend on global parameters of 
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the corresponding statement. Applying successively (8), (23), (24), and (10) we have 



E|t4(7Tg +fc /, T q+k _it) - U n (n q+ k-lf,T q+k -xt) 



2p 



< 71 P ^V||7T g+ fe/ - 7T q+ k-lf\\ e Mhq+kf ~ 7T 9 +fe-l/||c) 



n 



-(p-nni-ye-eKl+k) 



(25) 



and analogously 



V\U n (<+K n J,T q+Kn J) ~ U n (ir q+Kn J,r q+Kn j)\ 2p « Y^n-^h^-^*^. 



(26) 



For fixed g £ Q we have by stationarity 



El T g+fe f) - C/ n (5r,rg +fc _it)| 2p = n P E 



i=l 



(27) 



where we consider • • • = 0. Note that by construction r q+k t — r q+k -it £ {0, 2 

for every i G [0, 1] and therefore 

[riT q+k t] - [nT q+k ^t} < n2~( 9+fc ) + 1 for all n > 2 q+k . 

Applying (8), (13), and (10) to (27) we obtain 

p 

E\U n (ir q+k f,T q+k t) - U n (Tr q+k f,r q+k _ 1 t)\ 2p « n-^(n2-^ +fc ))'||7r (?+fc /||fcD,(|| 7 r g+fc /|| c ) 



^ J^ n -(p-<) 2 (7£-<)(9+fc) 



(28) 



and analogously 



V\U n (n q+Kn J,T> q+Kn j) - U n W q+Kn J,r q+KnJ )\ 2p « J^„-(M)2(T*-i)(^»..). (29) 



Now, apply (25), (26), (28), and (29) to (22). We infer 



U n (f,t)-U^(f,t) 



> e 



P* I sup sup 

« £ #n, q>3 #F k>q>i (fc(fc + p 1))2p £ n-tp- 



*)o(7£ -*)(<?+*) 



(30) 



fe=i 
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Recall that by construction of the partitions of J- and [0, 1] at the beginning of this section, 
we have £*Li #T k , qd = 2*+ k and = #Hq + k) < N q+k ^N q+k . Therefore 

(30) yields 



P* sup sup 

Vt6[0,l] fdT 



U n (f,t)-U®(f,t) 



> e 



p K ntq 21 N q 

« E E E E rtF^n-tr-^-^ 

£=1 k=l j=l i=l 

« E E ^-^^-(MJa^+i)^). 



£=1 fc=l 

This implies that for any rj > 



P* sup sup 
\te[o,i] /eJ- 



> e 



U n (f,t)-U}*\f,t) 

« ^>-^max{l , 2^-^+ 2 +")("+^, 9 )} ]T AT g+fe _ 1 iV 9+fe A ; 4 f2-( r+1 +'')^+ fe ) 

1=1 k=l 

<max|l , max n i(7^-2 P +r+2+„) j g iVj fc _itf Jk fc 4 *2-( r+1+1 >) fc . (31) 

^ e - 1 '-' p J k=q+l 

By (11) we can choose r] small enough to assure ji + £ — 2p + r + 2 + i] < for all £ = 1, . . . ,p. 
Thus the factor in front of the sum is uniformly bounded w.r.t. n. Using (14), we obtain 

oo oo oo 

^2 N k ^N k k^2^ r+1+ ^ k < ^2 2~ {r+1)k N^ ■ k 4 P2- r > k + ^2 2~^ k Nl ■ k^2~^ k < oo 

k=l k=l k=l 

for sufficiently small r] > which implies that the series in (31) goes to zero as q — > oo. □ 

5 Proof of Lemma 2 

For / G J 7 , recall the definition of the approximating functions ir„f in Section 4 and note 
that as a consequence of the entropy condition in Theorem 4, we know that for every q G N* 

II/- K q f\\s<2-i (32) 
llVlle < *( 2 ")- (33) 
Similarly, for all g G T and k G N* there exist some gk & G satisfying 

\\9k-g\\s < k- p (34) 
\\9k\\c (35) 
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Let U( qS> denote the limit process given in Proposition 2. Condition (i) implies that for all 
f,g^J 7 ,t,u£ [0, 1] and q £ N* 

Cov(U^(f,t),U^(g,u)) 

f OO OO n 

= min{t,u} iJ2Cav(n q f(X ),« q g(X k )) +^Cov( 7 r 95 (X ),7r ? /(X jfc )) I. 

^k=0 k=l ' 

Since the autocovariance functions of a converging Gaussian process converge to the auto- 
covariance functions of the limit process, the covariance structure of the limit process K of 
£/(«) is given by Cov(K(f,t),K(g,u)) = lim q ^ 00 Cov(U^(f,t),U^(g,u)). Thus it suffices 
to show that 



fc=o 



]TCov(V(*oW(*fc)) - Cav(f(X ),g(X k )) 

OO 

J2Cov(ir q g(X ),ir q f(X k )) - Cav(g{X )J(X h )) 



(36) 



k=l 



as q — > oo. 



By symmetry, both series can be treated the same way. Let k(q) := 2 q ^ . We consider the 
series in line (36). We have 

OO 

\j2Cov(Tr q f(X ),ir q g(X k )) - Cav(f(X ),g(X k )) 

k(q) fc(<?) 

< £|Cav(V(*o) - f(X ),n q g(X k ))\+Yl\Cav(f(X ),ir q g(X k ) - g(X k ))\ (37) 

fc=0 



k=0 

oo 

+ |Cov(V(Xo) - f(X ),ir q g(X k ))\ 

k=k(q) + l 
oo 

+ \Cov(f(X ),7r q g(X k )-g(X k ))\. 

k=k(q) + l 



(38) 
(39) 



Let us treat the terms separately. Recall that both T and Q are uniformly bounded in 
|| • Hoo-norm. For the term in line (37), we know by Holder's inequality, (32), and the fact 
that P > 1 that 

k(q) k(q) 

£|Cov(V(*o) " f(X ),7T q g(X k )) | + J2\Cov(f(X ), ir q g(X k ) - g(X k )) \ 



k=0 



k=0 



k(q) 



<E(HV-/IU + IK<?-</l| s ) 

k=0 

< k(q)2~ q = 2~ (1 ~^<? — > as q oo, 
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where again, we write x <C y if there is a constant C G (0, oo) depending only on global 
parameters such that x < Cy. For the term in line (38), by (ii), (32), and (33) we obtain 

oo 

|Cov(VPT ) " f(X ),7T q g(X k ))\ 

k=k(q) + l 

oo 

<K/-/II- E \M\c@(k) 

k=k(q)+l 

oo 

<C 2 - « *( 2<? ) ( fc ) — >■ as g ^ oo, 

fc=%)+l 

where we used that ^ is increasing and condition (ii) in the last step. It only remains to 
show, that the term in line (39) goes to zero as q — > oo. We have 

oo 

\Cov(f(X ),ir q g(X k )-g(X k ))\ 

k=k(q)+l 

oo 

< \Cov(f(X ),7r q g(X k )-g k (X k ))\ (40) 

k=k(q) + l 

oo 

+ \Cav(f(X ),g k (X k )-g(X k ))\. (41) 

k=k(q) + l 

First, consider the term in line (40). By (ii), (33), and (35) 

oo 

\Cov(f(X Q ),7r q g(X k )-g k (X k ))\ 

k=k(q) + l 

oo 

« E \\fl\K9-9k\\ce(k) 

k=k{q) + l 



« ( E IK<?llc©(*o) + ( E Wc©*) 

^k=k{q)+l ' ^k=k(q)+l ' 

«( E $ ( 2? ) W) + ( E ) — >• asg 



CO, 



where we used that \& is increasing and applied condition (ii) in the last line. To treat the 
term in line (41), we use Holder's inequality and (34). We obtain 

oo oo 

Y \Co-v(f(X ),g k (X k )-g(X k ))\ < £ \\g k - g\\ s 

k=k(q)+l k=k(q)+l 

oo 

<C V] A;"' 3 — >■ as q ->■ oo, 
fc=fc(g)+l 

since ft > 1 and thus < 00 ' which completes the proof. □ 
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6 Proof of Theorem 1 



Let (Xi) and (£>, \\ ■ ||) be the Markov chain and the Banach space introduced in Section 2.2. 
To prove Theorem 1, we shall apply Theorem 5. We begin by showing, that Assumption 1 
holds with covariance structure (2). To this aim, we will partially follow the lines of the 
proof of Theorem A in Hennion and Herve (2001). For a measurable real- valued function / 
on X and a real number t € [0, 1], we introduce the notation 

[nt] 

S n (f,t) :=J2f(Xi). 

i=l 

The following proposition gives a sequential finite dimensional CLT under (A) - (D). 

Proposition 4. Suppose that for some m £ [l,oo], (A), (B), (C) hold. Let k be a pos- 
itive integer and t±, . . . , tf~ £ [0, 1]. Let fi, ■ ■ ■ , fk be real-valued functions on X such that 
u (\fi\ 2 ) < 00 an d (D) holds for the space C = Vect]R(/i, . . . , /&), the smallest real vector 
space containing fx, . . . , fy. Then, we have 

—j=(S n (fi ~ vfiM), ■ ■ -,S n (fk ~ vfk,tk)) iV(0, S) as n -> oo, 

where ^(0,5]) is a normal distribution in R fc with mean and covariance matrix X = 
(£ij)i<jj<fc. If furthermore /i, • • • € L s (y)nSu with s = m/(m— 1), then the covariance 
matrix is given by 

^k=0 k=l 

This proposition will show that Assumption 1 holds with covariance structure (2) since 
by assumption, Q is only composed by bounded real-valued functions from B. 

Proof. First let / be a function as in the statement of the proposition. By the Perturbation 
Theorem (see Theorem III. 8 in Hennion and Herve (2001)), there exist a neighbourhood If 
of and < 6 < r] < 1 such that for all t£lf, there exist operators ILf it and Nfj and 
complex numbers A/t such that 

Pf,t = A /lt n /it + N fit 

with 

U} t = n /)t , N ftt oU u = U Lt o N u = 0, p(N u ) < 0, \X U \ > r] for all t G If. 

Moreover, Aj t o = 1, n^o = n, iV^o = N and the maps t \— > Aj^, t \— > Hfj and t \— > Nft have 
continuous second derivatives on If. We thus have for all n > 1 

l"f, = A?, 1 1/, + Nf >t . 

Further, if v{f) = 0, by Lemma IV. 4' in Hennion and Herve (2001) the Taylor expansion of 
Xf t t as t goes to is given by 

X U = 1 - £o* f + o(t 2 ) (42) 
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with 

a):= lim - E(5 n (/, l) 2 ) (43) 

These are the main ingredients to derive a CLT for the process (/pQ))i>o- Here we want 
to show a finite dimensional sequential CLT. Without loss of generality we will treat the 
case k = 2. By the Cramer- Wold device, it is sufficient to prove the convergence of the real 
linear combinations a\n~^ S n (fi, ti) + a2fi~^ S n (f2, £2) of any square z/-integrable functions 
/ij/2 € C to a normal distribution. Since for ii < t2, the preceding term is equal to 

[nta] 

a -^(oi/i + 02/2, *l) + ^ a 2 f2(Xi) 

t=[nti]+l 

and C is a real vector space, it is sufficient to show the convergence of all sums of the form 
n~2S n with 

H n 
S'n(/,5, S )=^/(X i )+ ^ 5 (X0, 
i=l i=[ns]+l 

where f,g£C,s£ (0, 1). So, fix /, g G C, s G (0, 1) and set S* n = S n (f,g, s). The following 
lemma gives us an expression of the corresponding characteristic function. 

Lemma 3. For every function <p E B, t SH, and n > 1, 

E(e^>(X Tl ))=^(p}7lp fl 7^V)- 
In particular, the characteristic function ofn~^S n is given by 

E f e 4tn " hs A=u( P [ ; s \ P n i ns] l x ) . (44) 

Proof of Lemma 3. For every k > 1 and every measurable function F : X k_1 — > R, we have 
E ( e it( J F(x 1) ... ) x fe _ 1 )+/(^)) ¥3 ( Xfc )) = E^pri,...,^-!) E(e^^)^(X fe )|X fe _ 1 , . . . ,X X )\ 

and the same equation with g instead of /. The Lemma can now be proved by induction. □ 
To study the weak convergence of -^P=, we have to compute the limit of 

p[ns] p n -jns] lx = A M ^-[Hjj t \ x + X^ILf t N^l* 

+ X n ~_l s] N [ :±U,l x + N [ ™j N n -± s] l x . 

9>77S f'./s a ' spa *>77S 
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By (42), we infer that 

\\ns] i t 9\ 1 iB- \ns] / t /-. \ 2\ 

A t — > exp(— — sap) and A t — > exp(— — (1 — s)a g ) as n — > oo, 

where 07 and a g are given by (43). Further, since p(Nfj) < 1 and p(Ng )t ) < 1, we have 
that ||iVj? t ||£( B ) — > and H-ZV^JI^g) — > uniformly in t G // fl I g as n — )• oo. By continuity, 
we also have Hr j_\x — > lx and II t b-» lx as n — >• oo. We therefore obtain 

II \M TT jvr"-Hi ,ii / |\M I II tt II II Ar n_ [™ s ]|l ihUU—vn 

II \ n— Ins] -* T \ns\ n- -i II ^ I \ n— [nsll || *r[nsl II Mtt II Hi n n 

||iV M .-M^ L < ||iV M || ||at«-M|| |Mfl 0; 

and 

A M ^ ^ exp(-| ao j) exp(-f (1 - S )a 9 2 )l^ 

as n — y oo. Thus we infer 

a 



t z 

'lx = exp( - 



lim P} res ] P;1 nsl l^ = exp(-^( S o- 2 + (1 - s^ 2 )) 1* 



as n — >• oo, 



which, using (44), gives the weak convergence of to a centred normal distribution with 

variance given by o~/ iffiS = so~ 2 + (1 — s)o" 2 - By (43), we obtain that Proposition 4 holds with 
the covariance matrix £ given by 

Ei j = min{i; , tj }- (4+/, " 4 " <4 ) ■ ( 45 ) 

Lemma 4. Under the conditions (A), (B), and (C), /or all f £ B and all g £ Ij s (v), with 
s = ^ry, we have 

\Cov(g(X )J(X k ))\<C\\g\\ s \\f\\ B e k . 

Proof. Applying successively Holder inequality, (B), and (C), we get 

| Cav(g(X )J(X k ))\ < E \g(X ) E(/(X n ) - uf\X )\ 
< \\g\\ s \\P n f-(uf)lx\\B 

<c\\ g \\ s \\f\\ B e k . 

□ 

The preceding Lemma shows that the series X^a-Lo Cov(/(Xo), f(X k )) converges for / 6 
i3ignL s (^). Thus, using Kronecker's Lemma, equation (43) becomes 



a) = Hf) + 2 Cov(/(X ), f(X k )), 



k>l 

which, with (45), completes the proof of Proposition 4. □ 
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Proof of Theorem 1. Now, take C = Vect^(Q). By Proposition 4, Assumption 1 is satisfied. 
In order to apply Theorem 5 to prove Theorem 1, it remains to show the multiple mixing 
property of (Xj)j 6 p}. The following lemma, which is basically Lemma 3 in Dehling and Durieu 
(2011), gives this property w.r.t. the Banach space B containing C. 

Lemma 5. Under the conditions (A), (B), (C), and (E), (Aj)j g N satisfies the multiple mixing 
property w.r.t. B with do = and s = m/(m — 1). 

Proof. Let / G B such that ||/||oo < 1 an d set s = m/{m — 1). For all p > q > 0, for 
alH < h < ... < ip, we write g = fp i i+^~ i i+^ ( _ _ (fP ip (f)))). By (E), g 

belongs to B. Using Holder's inequality, we obtain 

|Cov(/(A, ) • • • f{X iq _ x )J{X iq ) ■ ■ ■ f(X tp ))\ 

< B(\f(X i0 ) ■ ■ ■ /(X^fsWP^-^ig) - u(g)\\ m . 

Using (B), (C), and ||/||oo < 1, we infer 

|Cov(/(A J0 ) • • • /PVJ, f(X lq ) • • • f(X lp ))\ < K \\f\\ s \\g\\ B e 1 ^-^. 

Now, since the spectral radius of P G C(B) is 1, there exists a c > 1, which does not depend 
on /, such that < c\\f\\s for all k G N*. By (E), we obtain two constants C > 

and I G N*, depending only on p, such that < C||/||g. This completes the proof of the 
lemma. □ 

To conclude the proof of Theorem 1 observe, that the extra assumptions in Theorem 5, 
which concern the covariance structure of the limit process, are satisfied due to Proposition 4 
and Lemma 4. □ 



7 Appendix 

Consider the natural generalization of the process R n introduced in Section 1. Let (Aj)j 6 N 
be a Af-valued stationary process with empirical measure F n (f) := n~ x Ya=i f{Xi), n G N*. 
We set F (f) = 0. For j G {1, . . . , n} we define F jttl (f) := (n - j + l)" 1 £" =i /(^) and 
set F n+ltn (f) := 0. Consider the ^{F x [0, l])-valued process R n — (R n (f, i))(/ j f)ej'x[o,i] 
given by 

R n (f,t) ■= v^^^^(^ M (/) " F [nt]+1>n (f))- 
The following theorem gives the asymptotic distribution of R n . 

Theorem 7. Assume that (Aj)j g N satisfies the sequential empirical CLT with indexing class 
T and limit process K, that is, U n ~» K in £°°(F x [0, 1]) as n — > 00, where K denotes a 
tight centred Gaussian process. Then 

R n ^ (K(f,t) -tK(f,l)) meTx[0tl] 

in i°°(F x [0, 1]) to as n -)■ 00. 
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Proof. Let \i denote the distribution function of the X{. For t £ [1/n, 1) we have 

F[nt]{f) - F[nt]+1 (/) 

[nt] n 

nt] 

A 1 /) 



— T7T E (/(^ 

nt n — \nt\ ^-^ 

1 ' i=l L J j=[nt]+l 

/ 1 1 \ H 1 

t-tt + — r^T E - m/) ftt E (/(*< 

V nt n — nt / n — nt ^— ' 

x L ' i=l L J i=l 



C/ n (/,t)-^I^_tC/ re (/,l). (46) 



/n [nt] n — [nt] ' i/n t n — [nt] 

Further, by definition we have R n (f,l) = and R n (f,t) = for t 6 [0, 1/n). Since also 
U n (f,t) = for t € [0, 1/n), we obtain with (46) that 

i?„(/,t) = [/„(/, t)-M [/„,(/, 1), 
n 

= C/ n (/, t) - tCZ„(/, 1) + nt ~ [nt] U n (f, 1) for all t E [0, 1]. (47) 

n 

Let ^4 n denote the T x [0, l]-indexed processes given by A n (f,t) := ((nt — [nt])/n)U n (f,t). 
Since sup tg [ 0) i] |(nt — [nt])/n| — > as n — > oo, by Slutsky's Theorem and the sequential 
empirical CLT, A n converges in distribution (and thus in probability) to zero. Another 
application of Slutsky's theorem and the sequential empirical CLT on (47) yields 

Rn = (U n (f,t)-tU n (f,l)) {ftt)eJ : xm +An - (K(f,t)-tK(f,l)) {u)eTxm . 

Here we have applied the continuous mapping theorem in the final step. □ 

Remark 5. Note that, in the setting of Theorem 1 and Theorem 3 and for a wide class of 
multiple mixing processes (c.f. Theorem 5), the covariance structure of K is given by (2). 

An application of the continuous mapping theorem with the supremum-functional to the 
above theorem yields the following proposition about the asymptotic distribution of the test 
statistic T n . 

Proposition 5. If (Aj)j 6 N* satisfies the sequential empirical CLT, then under the null 
hypothesis Ho we have the convergence 

T n := max sup - (l - ~)^\F k (f) - F k+1 , n (f)\ - sup \K(f,t) - tK(f, 1)|. 
o<k<n f£T n\ nJ te[o,i] 

Proof. Rn(f, •) is obviously constant on the intervals [k/n, (k + l)/n), k = 0, ... ,n — 1 
and further R n (f,k/n) = k/n(l — k/n)^/n(F k (f) — -Ffc+i,n(/)) for k = 0, ...,n. Thus 
T n = supj 6 jr t6 [ 0il ] R n (f,t) and we can apply the continuous mapping theorem with 



i°°(F x [0, 1]) — > R, <p^ sup \<p(f,t)\. 

fer, te[o,i] 



□ 
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